A comparison of selective classification methods in DNA microarray data of cancer: some recommendations for application in health promotion.

نویسندگان

  • Tohid Jafari Koshki
  • Ebrahim Hajizadeh
  • Mehrdad Karimi
چکیده

BACKGROUND The aim of this study was to apply a new method for selecting a few genes, out of thousands, as plausible markers of a disease. METHODS Hierarchical clustering technique was used along with Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers to select marker-genes of three types of breast cancer. In this method, at each step, one subject is left out and the algorithm iteratively selects some clusters of genes from the remainder of subjects and selects a representative gene from each cluster. Then, classifiers are constructed based on these genes and the accuracy of each classifier to predict the class of leftout subject is recorded. The classifier with higher precision is considered superior. RESULTS Combining classification techniques with clustering method resulted in fewer genes with high degree of statistical precision. Although all classifiers selected a few genes from pre-determined highly ranked genes, the precision did not decrease. SVM precision was 100% with 22 genes instead of 50 genes while the NB resulted in higher precision of 97.95% in this case. When 20 highly ranked genes selected to be fed to the algorithm, same precision was obtained using 6 and 5 genes with SVM and NB classifiers respectively. CONCLUSION Using hybrid method could be effective in choosing fewer number of plausible marker genes so that the classification precision of these markers is increased. In addition, this method enables detecting new plausible markers that their association to disease under study is not biologically proved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy

 In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Health promotion perspectives

دوره 3 1  شماره 

صفحات  -

تاریخ انتشار 2013